Picture for Dylan Feng

Dylan Feng

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

Add code
May 20, 2026
Viaarxiv icon

MJ1: Multimodal Judgment via Grounded Verification

Add code
Mar 09, 2026
Viaarxiv icon

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

Add code
Dec 10, 2025
Viaarxiv icon

AssistanceZero: Scalably Solving Assistance Games

Add code
Apr 09, 2025
Viaarxiv icon